Search CORE

49 research outputs found

Non-distributional Word Vector Representations

Author: Dyer Chris
Faruqui Manaal
Publication venue
Publication date: 01/01/2015
Field of study

Data-driven representation learning for words is a technique of central importance in NLP. While indisputably useful as a source of features in downstream tasks, such vectors tend to consist of uninterpretable components whose relationship to the categories of traditional lexical semantic theories is tenuous at best. We present a method for constructing interpretable word vectors from hand-crafted linguistic resources like WordNet, FrameNet etc. These vectors are binary (i.e, contain only 0 and 1) and are 99.9% sparse. We analyze their performance on state-of-the-art evaluation methods for distributional models of word vectors and find they are competitive to standard distributional approaches.Comment: Proceedings of ACL 201

arXiv.org e-Print Archive

Crossref

Correlation-based Intrinsic Evaluation of Word Vector Representations

Author: Dyer Chris
Faruqui Manaal
Tsvetkov Yulia
Publication venue
Publication date: 01/01/2016
Field of study

We introduce QVEC-CCA--an intrinsic evaluation metric for word vector representations based on correlations of learned vectors with features extracted from linguistic resources. We show that QVEC-CCA scores are an effective proxy for a range of extrinsic semantic and syntactic tasks. We also show that the proposed evaluation obtains higher and more consistent correlations with downstream tasks, compared to existing approaches to intrinsic evaluation of word vectors that are based on word similarity.Comment: RepEval 2016, 5 page

arXiv.org e-Print Archive

Crossref

Automatic correction of disfluent spoken queries

Author: Faruqui Manaal
Gopal Siddharth
Publication venue: Technical Disclosure Commons
Publication date: 12/11/2019
Field of study

A user’s interaction with a virtual assistant typically involves spoken requests, queries, and commands which often includes disfluencies. This disclosure describes techniques to automatically correct disfluent queries. Per techniques of this disclosure, a disfluency correction machine learning model is utilized to convert a disfluent query to a corresponding fluent query. Lexical features extracted from the disfluent query are utilized to determine a portion of the query that is removed from the disfluent query to convert it to a fluent query. The model is trained using pairs of queries

Technical Disclosure Common

Contextual Error Correction in Automatic Speech Recognition

Author: Christensen Janara
Faruqui Manaal
Publication venue: Technical Disclosure Commons
Publication date: 06/03/2020
Field of study

This disclosure describes techniques that leverage the context of a conversation between a user and a virtual assistant to correct errors in automatic speech recognition (ASR). Once confirmed by the user, the correction event is used to augment the training data for ASR

Technical Disclosure Common

Learning Word Representations with Hierarchical Sparse Coding

Author: Dyer Chris
Faruqui Manaal
Smith Noah A.
Yogatama Dani
Publication venue
Publication date: 06/11/2014
Field of study

We propose a new method for learning word representations using hierarchical regularization in sparse coding inspired by the linguistic study of word meanings. We show an efficient learning algorithm based on stochastic proximal methods that is significantly faster than previous approaches, making it possible to perform hierarchical sparse coding on a corpus of billions of word tokens. Experiments on various benchmark tasks---word similarity ranking, analogies, sentence completion, and sentiment analysis---demonstrate that the method outperforms or is competitive with state-of-the-art methods. Our word representations are available at \url{http://www.ark.cs.cmu.edu/dyogatam/wordvecs/}

arXiv.org e-Print Archive

CiteSeerX